Average complexity of exact and approximate multiple string matching

نویسندگان

  • Gonzalo Navarro
  • Kimmo Fredriksson
چکیده

We show that the average number of characters examined to search for r random patterns of length m in a text of length n over a uniformly distributed alphabet of size σ cannot be less than Ω(n log σ (rm)/m). When we permit up to k insertions, deletions, and/or substitutions of characters in the occurrences of the patterns, the lower bound becomes Ω(n(k+ log σ (rm))/m). This generalizes previous single-pattern lower bounds of Yao (for exact matching) and of Chang and Marr (for approximate matching), and proves the optimality of several existing multipattern search algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Average-Optimal Multiple Approximate String Matching

We present a new algorithm for multiple approximate string matching, based on an extension of the optimal (on average) singlepattern approximate string matching algorithm of Chang and Marr. Our algorithm inherits the optimality and is also competitive in practice. We present a second algorithm that is linear time and handles higher difference ratios. We show experimentally that our algorithms a...

متن کامل

Average-optimal string matching

The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences. While this is seemingly more difficult than the original problem, we show that the idea leads...

متن کامل

On-Line Approximate String Matching with Bounded Errors

We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter so that the algorithm is allowed to miss occurrences with probability . This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the pr...

متن کامل

Improved Two-Way Bit-parallel Search

New bit-parallel algorithms for exact and approximate string matching are introduced. TSO is a two-way Shift-Or algorithm, TSA is a two-way Shift-And algorithm, and TSAdd is a two-way Shift-Add algorithm. Tuned Shift-Add is a minimalist improvement to the original Shift-Add algorithm. TSO and TSA are for exact string matching, while TSAdd and tuned Shift-Add are for approximate string matching ...

متن کامل

Approximate Boyer-Moore String Matching

The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 321  شماره 

صفحات  -

تاریخ انتشار 2004